Abstract
Background: Acute myeloid leukemia (AML) is a hematopoietic stem cell malignancy characterized by clonal proliferation and reduced differentiation of myeloid progenitor cells. The complex interactions among clinical, cytogenetic, and molecular features result in variable treatment responses and outcomes. Accurately predicting response to available therapy is an important first step in managing patients with newly diagnosed AML. Those with refractory disease generally have shorter survival. While existing risk stratification systems such as European LeukemiaNet (ELN) 2022 or ELN 2024 offer broad prognostic categories, they often fail to provide patient-specific predictions. Bayesian networks, which model conditional dependencies among variables, offer a promising framework for individualized outcome prediction. In this study, we developed a machine learning model using Bayesian networks to estimate the probability of achieving response (complete response [CR]/CR with incomplete hematologic response [CRi]/morphologic leukemia-free state [MLFS]) after frontline therapy in AML patients.
Methods: A retrospective cohort of AML patients diagnosed between 5/2011 and 11/2023 treated at Moffitt Cancer Center with either frontline intensive chemotherapy (cytarabine with anthracycline [7+3]) or hypomethylating agent with venetoclax (HMA/VEN) were identified. A total of 55 features were assessed, including but not limited to demographics (age, race, performance status), antecedent myeloid malignancy, laboratory (white counts, neutrophils, hemoglobin, platelets), and pathology (marrow blasts, cellularity, cytogenetics, mutational profile). Features selection was conducted by integrating a relative importance ranking list from Random Forest algorithm, a Markov blanket approach, and expert knowledge to identify an appropriate subset of 55 variables for response status prediction. A Bayesian network was trained from 80% of the dataset based on the selected features with the remaining 20% reserved for independent testing. Test data was identified by randomization. The directed acyclic graph was visualized using Netica software to facilitate interpretation of variable dependencies. Model performance was assessed using cross validation and area under the receiver operating characteristic curve (AUC). The 95% confidence interval (CI) for final Bayesian network's performance was obtained from 2000 stratified bootstrap replicates.
Results: A total of 651 patients (7+3: n=322, HMA/VEN: n=329) were identified. The final model retained 10 key features including age, prior solid tumor/myeloid malignancies, cytogenetic risk by ELN, chemotherapy intensity, and mutations in NPM1, IDH1, IDH2, RUNX1, and TP53. The Bayesian network predictive model achieved an AUC of 0.74 (95% IC: 0.688 - 0.781), with an overall accuracy of 0.70 in the test set. The model depicted biologically plausible pathways underlying remission and revealed interactions between genetic mutations and clinical phenotypes associated with treatment response. Visual representation of the network highlighted interpretable pathways through which individual features influenced response probability. Calibration analysis demonstrated close alignment between predicted and observed response rates across risk groups.
Conclusion: This study demonstrates that Bayesian networks, combined with our comprehensive feature selection method, provide a transparent and accurate approach to predict frontline therapy response in AML. The model's ability to incorporate probabilistic reasoning and visualize variable dependencies offers distinct advantages over black-box classifiers, supporting its potential utility in clinical decision-making. These results suggest that integrating Bayesian network models into diagnostic workflows may enhance personalized treatment strategies and enable earlier identification of patients at high risk of frontline treatment failure. Future directions include external validation and expansion of the model to incorporate other treatment types, measurable residual disease (MRD), and longitudinal data.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal